AITopics | alternative hypothesis

The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $\mathcal O(n^2)$ to $\mathcal O(n \log n)$.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2601.13874

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing

Neural Information Processing SystemsDec-25-2025, 00:45:39 GMT

Conditional independence (CI) testing is a fundamental task in statistics and machine learning, but its effectiveness is hindered by the challenges posed by high-dimensional conditioning variables and limited data samples. This article introduces a novel testing approach to address these challenges and enhance control of the type I error while achieving high power under alternative hypotheses. The proposed approach incorporates a computationally efficient classifier-based conditional mutual information (CMI) estimator, capable of capturing intricate dependence structures among variables. To approximate a distribution encoding the null hypothesis, a $k$-nearest-neighbor local sampling strategy is employed. An important advantage of this approach is its ability to operate without assumptions about distribution forms or feature dependencies. Furthermore, it eliminates the need to derive asymptotic null distributions for the estimated CMI and avoids dataset splitting, making it particularly suitable for small datasets.

conditional independence testing, k-nearest-neighbor local sampling, name change, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.65)

Add feedback

Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

Neural Information Processing SystemsDec-24-2025, 08:56:50 GMT

Several out-of-distribution (OOD) detection scores have been recently proposed for deep generative models because the direct use of the likelihood threshold for OOD detection has been shown to be problematic. In this paper, we propose a new OOD score based on a Bayesian hypothesis test called the locally most powerful Bayesian test (LMPBT). The LMPBT is locally most powerful in that the alternative hypothesis (the representative parameter for the OOD sample) is specified to maximize the probability that the Bayes factor exceeds the evidence threshold in favor of the alternative hypothesis provided that the parameter specified under the alternative hypothesis is in the neighborhood of the parameter specified under the null hypothesis. That is, under this neighborhood parameter condition, the test with the proposed alternative hypothesis maximizes the probability of correct detection of OOD samples. We also propose numerical strategies for more efficient and reliable computation of the LMPBT for practical application to deep generative models. Evaluations conducted of the OOD detection performance of the LMPBT on various benchmark datasets demonstrate its superior performance over existing OOD detection methods.

hypothesis, out-of-distribution detection, powerful bayesian test, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Add feedback

Machine-Learning-Assisted Comparison of Regression Functions

Yan, Jian, Li, Zhuoxi, Ning, Yang, Chen, Yong

arXiv.org Machine LearningOct-29-2025

We revisit the classical problem of comparing regression functions, a fundamental question in statistical inference with broad relevance to modern applications such as data integration, transfer learning, and causal inference. Existing approaches typically rely on smoothing techniques and are thus hindered by the curse of dimensionality. We propose a generalized notion of kernel-based conditional mean dependence that provides a new characterization of the null hypothesis of equal regression functions. Building on this reformulation, we develop two novel tests that leverage modern machine learning methods for flexible estimation. We establish the asymptotic properties of the test statistics, which hold under both fixed- and high-dimensional regimes. Unlike existing methods that often require restrictive distributional assumptions, our framework only imposes mild moment conditions. The efficacy of the proposed tests is demonstrated through extensive numerical studies.

artificial intelligence, machine learning, regression function, (14 more...)

arXiv.org Machine Learning

2510.24714

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Anchor-based Maximum Discrepancy for Relative Similarity Testing

Zhou, Zhijian, Peng, Liuhua, Tian, Xunye, Liu, Feng

arXiv.org Artificial IntelligenceOct-14-2025

The relative similarity testing aims to determine which of the distributions, P or Q, is closer to an anchor distribution U. Existing kernel-based approaches often test the relative similarity with a fixed kernel in a manually specified alternative hypothesis, e.g., Q is closer to U than P. Although kernel selection is known to be important to kernel-based testing methods, the manually specified hypothesis poses a significant challenge for kernel selection in relative similarity testing: Once the hypothesis is specified first, we can always find a kernel such that the hypothesis is rejected. This challenge makes relative similarity testing ill-defined when we want to select a good kernel after the hypothesis is specified. In this paper, we cope with this challenge via learning a proper hypothesis and a kernel simultaneously, instead of learning a kernel after manually specifying the hypothesis. We propose an anchor-based maximum discrepancy (AMD), which defines the relative similarity as the maximum discrepancy between the distances of (U, P) and (U, Q) in a space of deep kernels. Based on AMD, our testing incorporates two phases. In Phase I, we estimate the AMD over the deep kernel space and infer the potential hypothesis. In Phase II, we assess the statistical significance of the potential hypothesis, where we propose a unified testing framework to derive thresholds for tests over different possible hypotheses from Phase I. Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets. Codes are publicly available at: https://github.com/zhijianzhouml/AMD.

artificial intelligence, hypothesis, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.10477

Country: